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A METHOD FOR RECOVERING 3D SCENE STRUCTURE AND CAMERA 
MOTION DIRECTLY FROM IMAGE INTENSITIES 

RELATED APPLICATION 

The present application is related to U.S. application Serial No. ^titled A 

Method for Recovering 3D Structure and Camera Motion from Points, Lines and/or 
Directly from the Image Intensities, filed on by the same inventor as the 

present application, which related application is incorporated herein by reference. 
BACKGROUND OF THE INVENTION 

L Field of the Invention 

The present invention relates generally to a method for recovering the camera 
motion and 3D scene structure and, more particularly, to a linear algorithm for recovering 
the structure and motion directly from the image intensities where the camera moves 
along a line. 

2. Prior Art 

The science of rendering a 3D model from information derived from a 2D image 
predates computer graphics, having its roots in the fields of photogrammetry and 
computer vision. 

Photogrammetry is based on the basic idea that when a picture is taken, the 3D 
world is projected in perspective onto a flat 2D image plane. As a result, a feature in the 
2D image seen at a particular point actually lies along a particular ray beginning at the 
camera and extending out to infinity. By viewing the same feature in two different 
photographs the actual location can be resolved by constraining the feature to lie on the 
intersection of two rays. This process is known as triangulation. Using this process, any 
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point seen in at least two images can be located in 3D. It is also possible to solve for 
unknown camera positions as well with a sufficient number of points. The techniques of 
photgrammetry and triangulation were used in such applications as creating topographic 
maps from aerial images. However the photogrammetry process is time intensive and 
5 inefficient. 

Computer vision techniques include recovering 3D scene structure from stereo 
images, where correspondence between the two images is established automatically from 
two images via an iterative algorithm, which searches for matches between points in 
order to reconstruct a 3D scene. It is also possible to solve for the camera position and 
O 10 motion using 3D scene structure from stereo images. 

IP Current computer techniques are focused on motion-based reconstruction and are 

a natural application of computer technology to the problem of inferring 3D structure 

m 

J4 (geometry) from 2D images. This is known as Structure-from-Motion. Structure fi-om 

O motion (SFM), the problem of reconstructing an unknown 3D scene from muhiple 2D 

Ui 15 images of it, is one of the most studied problems in computer vision. 
O SFM algorithms are currently known that reconstruct the scene from previously 

computed feature correspondences, usually tracked points. Other algorithms are direct 
methods that reconstruct from the images' intensities without a separate stage of 
correspondence computation. The method of the present invention presents a direct 
20 method that is non-iterative, linear, and capable of reconstructing from arbitrarily many 
images. Previous direct methods were limited to a small number of images, required 
strong assumptions about the scene, usually planarity or employed iterative optimization 
and required a starting estimate. 
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Most SFM algorithms that are currently known reconstruct the scene from 
previously computed feature correspondences, usually tracked points. Other algorithms 
are direct methods that reconstruct from the images intensities without a separate stage of 
correspondence computation. Previous direct methods were limited to a small number of 
images, required strong assumptions about the scene, usually planarity or employed 
iterative optimization and required a starting estimate. 

These approaches have complementary advantages and disadvantages. Usually 
some fraction of the image data is of such low quality that it cannot be used to determine 
correspondence. Feature-based method address this problem by pre-selecting a few 
distinctive point or line features that are relatively easy to track, while direct methods 
attempt to compensate for the low quality of some of the data by exploiting the 
redundancy of the total data. Feature-based methods have the advantage that their input 
data is relatively reliable, but they neglect most of the available image information and 
only give sparse reconstructions of the 3D scene. Direct methods have the potential to 
give dense and accurate 3D reconstructions, due to their input data's redundancy, but 
they can be unduly affected by large errors in a fraction of the data. 

A method based on tracked lines is described in "A Linear Algorithm for Point 
and Line Based Structure from Motion", M. Spetsakis, CVGIP 56:2 230-241, 1992 , 
where the original linear algorithm for 13 lines in 3 images was presented. An 
optimization approach is disclosed in C.J. Taylor, D. Kriegmann, ''Structure and Motion 
from Line Segments in Multiple Images, " PAMI 17:1 1 1021-1032, 1995. Additionally, 
in "A unified factorization algorithm for points, line segments and planes with 
uncertainty models" K. Morris and L Kanade, ICCV 696-702, 1998, describes work on 
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lines in an affine framework. A projective method for lines and points is described in 
"Factorization methods for projective structure and motion", B. Triggs, CVPR 845-851, 
1996, which involves computing the projective depths from a small number of frames. 
"In Defense of the Eight-Point Algorithm: PAMI 19, 580-593, 1995, Hartley presented a 
5 fiiU perspective approach that reconstructs from points and Hnes tracked over three 
images. 

The approach described in M. Irani, "Multi-Frame Optical Flow Estimation using 
Subspace Constraints," ICCV 626-633, 1999 reconstructs directly from the image 
intensities. The essential step of Irani for recovering correspondence is a multi-frame 

10 generalization of the optical-flow approach described in B. Lucas and T. Kanade, "An 
Iterative Image Registration Technique with an Application to Stereo Vision", IJCAI 
674-679, 1981, which relies on a smoothness constraint and not on the rigidity constraint. 
Irani uses the factorization of D simply to fill out the entries of D that could not be 
computed initially. Irani writes the brightness constancy equation (7) in matrix form as A 

15 -DI , where D tabulates the shifts d' and I contains the intensity gradients V / (pn). 

Irani notes that D has rank 6 (for a camera with known calibration), which implies that A 
must have rank 6. To reduce the effects of noise, Irani projects the observed A onto one 
of rank 6, Irani then applies a multi-image form of the Lucas-Kanade approach to 
recovering optical flow which yields a matrix equation DI2 ^-A2, where the entries of h 

20 are squared intensity gradients la lb summed over the "smoothing" windows, and the 

entries of A2 have the form la AI. Due to the added Lucas-Kanade smoothing constraint, 
the shifts D or d n can be computed as D ^ - A2 [12]"^ denotes the pseudo-inverse, except in 
smoothing windows where the image intensity is constant in at least one direction. Using 
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the rank constraint on D, Irani determines additional entries of D for the windows where 
the intensity is constant in one direction. 

Any algorithm for small, linear motion confronts the aperture problem: the fact 
that the data within small image windows do not suffice to determine the correspondence 
5 unless one makes prior assumptions about the scene or motion. The aperture problem 
makes correspondence recovery a difficult and sometimes impossible global task. To 
avoid this, researchers typically impose a smoothness assumption. Lucas-Kanade uses a 
smoothing technique to address the aperture problem. 

10 SUMMARY OF THE INVENTION 

The present invention is directed to a method for recovering 3D scene structure 
and camera motion from image data obtained from a multi-image sequence, wherein a 
reference image of the sequence is taken by a camera at a reference perspective and one 
or more successive images of the sequence are taken at one or more successive different 
15 perspectives by translating and/or rotating the camera, the method comprising the steps 
of: 

(a) determining image data shifts for each successive image with respect to the 
reference image; the shifts being derived from the camera translation and/or rotation from 
the reference perspective to the successive different perspectives; 
20 (b) constructing a shift data matrix that incorporates the image data shifts for each 

image; 
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(c) calculating a rank-1 factorizations from the shift data matrix using SVD, with 
one of the rank-1 factors being a vector corresponding to the 3D structure and the other 
rank- 1 factor being a vector corresponding to the size of the camera motions; 

(d) dividing the successive images into smoothing windows; 

5 (e) recovering the direction of camera motion from the first vector corresponding 

to the 3D structure by solving a linear equation; and 

(f) recovering the 3D structure by solving a linear equation using the recovered 
camera motion. 

In accordance with the method of the present invention, step (e) includes step (e) 
10 includes: 

m computing a first projection matrix; 

recovering camera rotation vectors from the shift data matrix, and the first 

fli 

^ 15 projection matrix; 

^ computing a second projection matrix; and 

m recovering the direction of camera translation using the shift data matrix, the 

M: reference image, the second projection matrix and the recovered camera rotation 

O vectors. 
20 

In addition, step (f) includes recovering the 3D structure from the shift data matrix, 
the reference image, the recovered camera rotation vectors and the recovered direction of 
translation vectors. 

25 
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The method of the present invention further includes preliminary steps of 
recovering the rotations of the camera between each successive image; and warping all 
images in the sequence toward the reference image, while neglecting the translations. 

The present invention provides an algorithm for linear camera motion, where the 
5 camera moves roughly along a line, possibly with varying velocity and arbitrary 

rotations. The approach of the present invention applies for calibrated or uncalibrated 
cameras (the projective case). For specificity, we focus on the calibrated case, assuming 
(wlog) that the focal length is 1 . The method is based on the brightness constancy 
equation (BCE) and thus requires the motion and image displacements to be small 
10 enough so that the intensity changes between images can be modeled by derivatives at 
some resolution scale. 

BRIEF DESCRIPTION OF THE DRAWINGS 

These and other features, aspects, and advantages of the methods of the present invention 
15 will become better understood with regard to the following description, appended claims, 
and accompanying drawings where: 

FIG. 1 schematically illustrates a hardware implementation of the present invention. 
FIG; 2 is a block diagram that illustrates the method of the present invention. 

20 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 
Definitions 
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The method of the present invention assumes that the 3D stracture is to be 
recovered from an image sequence consists of A^; images of fixed size, each with 

pixels. Let p„ = {x^.y^J give the image coordinates of the /2-th pixel position. Let V 
denote the Mh image, with /=ai...,A^; -l,andlet l[ =r{pj denote the image 
5 intensity at the n-th pixel position inf . We take as the reference image. Let P„ 
denote the 3D point imaged at p^^ in the reference image, with = {X^ ,Y^,Z^ f in the 
coordinate system of . Let d[ denote the shift in image position from to f of the 
3D feature point P„ . The motion of the camera is described as its translation and 

rotation. Let T' = (t^ , T^* , 1/ J represent the camera translation between the reference 

10 image and image z, and let R' denote the camera rotation. In accordance with the method 
of the present invention we parameterize a small rotation by the rotational velocity 

CO' ^(col,o)i,coJ , Let a 3D point P transform as P' = i?(P-T). Let 

p'^ = (x^ ^ )^ = + d'^ be the shifted position in f ofp^el^ resulting from the 

motion T\i?.. 

15 Given a vector V , define [v]2 as the length-2 vector consisting of the first two 

components of V . Let V denote the 2D image point corresponding to the 3D point V: 

V = [v]2 / . For a 2D vector v, define the corresponding 3D point v = [v^ if . R'^v 
denotes the image point obtained from v after a rotation: i? * v = (Ry) , Let v = v/|v| . 
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The three rotational flows of the camera are defined as 



^«(;c,y),r(^)(x,;;),rW(x,>.)by[rW,r(^),r(^)]^ 



■xy 



0 + 



Let V/„ = V/(p J represent the (smoothed gradient of the image intensities 
(p^ ) and define , )^ = VI ^ , Similarly, let A/^ be the change in (smoothed) 
5 intensity with respect o the reference image. With no smoothing A/^ = - ll . Let A 
be a {Nj -~\)^Np matrix with entries A/^ . 

Suppose F"" is a set of quantities indexed by the integer a. The notation {rjis used to 
denote the vector with elements given hyV\ Let the (N^ - 1) x 3 matrices 
T = [{t^ } {t^ } {t^ }] and W = [{^^^ } {o)y ] {m^ }] encode all translations and rotational 
10 velocities for a sequence. We use the notation {f} to denote the vector with elements 
given by the V . 



15 



Define the {N, - l)x (a^^ - 1) matrix = J'' + 1 , and use 



Preliminary Analysis 

Before describing the method of the present invention, we shall describe the 
preliminary analysis used to derive the translational and rotational flow vectors to be 
appHed in the algorithm. For small rotations and translations, the matrix of feature-shifts 

20 d is approximately bilinear in the unknown V ,Z^, (We do not assume that the 
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rotations are small initially, but we can take them as small following their initial recovery 
and compensation.) By contracting this matrix with the V/„ , we get via the brightness 
constancy equation (described later herein) a bilinear relation between the intensity 
changes M[ and unknowns. 

Derivation 

The derivation of the flow vectors is described as follows. Up to noise, the 
feature-shift d^^ can be written as A[ = d!^^ + d^^ , (1) 

4 =^^^^^^ dk - PL -R-' *p1. Where d^ =dL(i?\pO 



d 



1-Z"T^ 

« z 



10 represents the rotational part of the shift and d'^^^ represents the translational part. When 
there is zero rotation, d^^ = d^„ . One can rewrited^ ^ * p^^, - p^n ^ where 
p^^ = Tj-i ^ = p« + d^^ . We assume small translations and small residual rotations. 
Then p, «p,+o(z;0, 

dk-z:^fe«"[Tl.Ko(z-v), (2) 

15 d^ ^ (D[v^'\pJ+co;r^'\pJ + (o;^^^^ Z'\r,o) representthe 

average sizes of the Z:^ , the translations and the residual rotations in radians. From "A 
Linear Solution for Multiframe Structure from Motion", J. Oliensis, lUW 1225-1231, 

1994 we get, o) « Z"V . 

Then using the brightness constancy equation (BCE), 

20 A/:+V/,.dl =0, (3) 
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which holds up to corrections of o(z~V^^y^7) where 7 gives the typical size 
of the noise in I[ . The brightness constancy equation and (2) imply that 
- A/> Z;' (v/„ . p,r; - V/„ [r \ )+ V/„ • (^;r W + co^/^) + ^;r^ ) ) . Then we define the 
three length- translational flow vectors as 

and also define the three length- A^^ rotational flow vectors as 
^ {V/ • r«(p)}, . {v/ . r(^)(p)), = {vJ • r(^)(p) . 

Thenlet 0 = [<D ,0^oJand = Then - A « TO^ + WT^ 

Then we define H as a (iV^ - 3)x iV^ matrix that annihilates the three 

10 vectors Y^, and satisfies HH^ = 1^ .3 , where 1^^ .3 is the identity matrix. One 

can then compute H, and products involving H, in o(iV^ ) using Householder matrices, 
which are described in "A Linear Solution for Multifi-ame Structure from Motion", J. 
Oliensis, lUW 1225-1231, 1994, and "A Multi-fi-ame Stioicture fi-om Motion Algorithm 
under Perspective Projection" J. Oliensis, IJCV 34:2/3, 163-192, 1999, and Workshop on 

15 visual Scenes, 77-84, 1995. It then follows that 

-AH^~TO^H^ (4) 
up to o(z"V^^o^<yZ~V,7). In practice, we use equation (4) above left-multiplied by 

C , with ^CH= C"^ AH ^ . Multiplying by C reduces the bias due to singling out 
the reference image for special treatment, a process described in "A Multi-frame 
20 Structure fi:om Motion Algorithm under Perspective Projection" which is referenced 
above. 
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Equation (4) relates the data-matrix, on the left, to the translations and structure, 
on the right. Multiplying by H has eliminated the rotational effects up to second order. 
These second order corrections include corrections of 0{o)f]), caused by errors in the 
measured V/ we use to define H. For small translations, o(z~V)- 0{o)) as described in 
"Rigorous Bounds for Two-Frame Structure from Motion," J. Oliensis lUW, 1225-1231, 
1994 so all the corrections in equation (4) have similar 

sizes: o[z~^r^ ) - o{6)Z'W) - ), Therefore, multiplying by H was crucial to reduce 
the rotational corrections to the same order as the translational corrections. 

Linear-Motion Algorithm 

The basic algorithm of the present invention for cases of linear camera motion is 
more particularly described as follows. 

0. Recover rotations and warp all images l\.J^'~^ toward the reference 
image , while neglecting the translations. Let the image displacements d'^ now 
refer to the unrotated images. 

1 . Compute H and A^^^ . Using the singular value decomposition, compute 
the best rank- 1 factorization of ~ A « M^^^S^^^^ , where M^^S'^ are vectors. If 
the leading singular value of - A^^ is much larger than the rest, this confirms that 
the motion is approximately linear and that the signal dominates the noise so that 

the algorithm has enough information to proceed. C^M^^^ gives the translation 
magnitudes up to an overall multiplicative scale. 
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2. Divide the image into small smoothing windows and take as constant 
within each window. List the pixels so that those in the ^-th smoothing window 
have sequential indices 7]^ , ij]^ + - 1) Then compute diN^^N^ 

projection matrix which is block diagonal with zero entries between different 
5 smoothing windows, and which annihilates the vectors {v/ • p}, {/^ }, and {/^ } . 

Then solve the overconstrained system of equations 

P^(h^sW-Yw)=0 (5) 
for the 3 -vector w. 

To complete the method of the Linear-Motion Algorithm, compute aN^xN^ 
10 projection matrix , which is block diagonal with zero entries between different 
smoothing windows and annihilates - Ww where w is the vector recovered 

previously. Then solve for the direction of translation t via 

^f(-TAL}-f,{l,}+t{p-^l])=0 (6) 
15 Finally, recover via 

(//^5W)„-M„=Z:'(f,p„-[t],).V/„ (7) 

Linear-Motion Algorithm Analysis 

Step 2. 

20 From (4), S^'^ - h{z"' (f^p - [f\ )• V/}. Then since H^H is a projection matrix 
annihilating the it follows that 
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H^s«-{z-^(f,p-[rJ-v/}+Yw (8) 

for some w. Since the matrix amiihilates the first term on the right hand side of (8), 
we get (5). and its products can be computed in o[Np ) . Solving (5) for w neglects 

the constraints that t is the same in each smoothing window, a total of 2{N^ - 1) 
5 constraints, where is the number of windows. Then applying P^ to (8) gives (6). 
Because we omitted 1{N^ - 1) constraints, Step 2 gives a suboptimal estimate of 
t and Z~J . As before, one can base Step 2 on a multi-frame reestimate of f and as 
before the caveat that if the original noise in is less than the recomputed , one 
should use directly. 

10 The linear-motion algorithm extends to deal with a camera translating on a plane 

or in all 3D directions. The number Nj^ of large singular values A^^ determines the 
dimensionahty of the motion, e.g., planar motion corresponds to A^^ = 2 . For each large 
singular value, the corresponding singular vector gives rise to an equation similar to (8), 
which can be solved as before for t , where each singular vector yields a different T . 

15 One recovers the from equations of the form of (7). 

Implementation 

It will be apparent to those skilled in the art that the methods of the present 
invention disclosed herein may be embodied and performed completely by software 
20 contained in an appropriate storage medium for controUing a computer. 

Referring to Fig. 1, which illustrates in block-diagram form a computer hardware 
system incorporating the invention. As indicated therein, the system includes a video 
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source 101 , whose output is digitized into a pixel map by a digitizer 102. The digitized 
video frames are then sent in electronic form via a system bus 103 to a storage device 104 
for access by the main system memory during usage. During usage the operation of the 
system is controlled by a central-processing unit, (CPU) 105 which controls the access to 

5 the digitized pixel map and the invention. The computer hardware system will include 
those standard components well-known to those skilled in the art for accessing and 
displaying data and graphics, such as a monitor, 106 and graphics board 107. 

The user interacts with the system by way of a keyboard 108 and or a mouse 109 
or other position-sensing device such as a track ball, which can be used to select items on 

10 the screen or direct functions of the system. 

The execution of the key tasks associated with the present invention is directed by 
instructions stored in the main memory of the system, which is controlled by the CPU. 
The CPU can access the main memory and perform the steps necessary to carry out the 
method of the present invention in accordance with instructions stored that govern CPU 

15 operation. Specifically, the CPU, in accordance with the input of a user will access the 
stored digitized video and in accordance with the instructions embodied in the present 
invention will analyze the selected video images in order to extract the 3D structure 
information from the associated digitized pixel maps. 

Referring now to Fig. 2 the method of the present invention will be described in 

20 relation to the block diagram. A first image in a sequence is taken by a camera at a 

reference perspective and one or more successive images are taken by moving the camera 
along a substantially linear plane to one or more successive different perspectives in step 
201 . The images are then digitized 202 for analysis of the 3D image content, i.e. image 
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intensities. From the digitized 3D image content, deteraiining image data shifts for each 
successive image 203 with respect to the reference image; the shifts being derived from 
the camera translation and/or rotation from the reference perspective to the successive 
different perspectives. 

5 Then incorporating the image data shifts for each image, constructing a shift data 

matrix 204. The shift data matrix is then used to calculate two rank-1 factorizations from 
the shift data matrix using SVD, one rank-1 factorization being a vector corresponding 
the 3D structure and the other rank-1 factorization being a vector corresponding the 
camera motion 205. The successive images are divided into smoothing windows 206 and 

10 the camera motion is recovered from the factorization vectors between the smoothing 
windows by solving a linear equation 207. Finally, the 3D structure is recovered by 
solving a linear equation using the recovered camera motion 208. 

While there has been shown and described what is considered to be preferred 
embodiments of the invention, it will, of course, be understood that various modifications 

15 and changes in the form or detail could readily be made without departing from the spirit 
of the invention. It is therefore intended that the invention be not limited to the exact 
forms described and illustrated, but should be constructed to cover all modifications that 
may fall within the scope of the appended claims. 
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WHAT IS CLAIMED IS: 

1 . A method for recovering 3D scene structure and camera motion from image data 
obtained from a multi-image sequence, wherein a reference image of the sequence is 
5 taken by a camera at a reference perspective and one or more successive images of the 
sequence are taken at one or more successive different perspectives by translating and/or 
rotating the camera, the method comprising the steps of: 

(a) determining image data shifts for each successive image with respect to the 
reference image; the shifts being derived from the camera translation and/or rotation from 

10 the reference perspective to the successive different perspectives; 

(b) constructing a shift data matrix that incorporates the image data shifts for each 

image; 

(c) calculating a rank-1 factorization from the shift data matrix using SVD, with 
one of the rank-1 factors being a vector corresponding to the 3D structure and the other 

15 rank- 1 factor being a vector corresponding to the size of the camera motions; 

(d) dividing the successive images into smoothing windows; 

(e) recovering the direction of camera motion from the first vector corresponding 
to the 3D structure by solving a linear equation; and 

(f) recovering the 3D structure by solving a linear equation using the recovered 
20 camera motion. 

2. The method of claim 1, wherein, step (e) includes: 
computing a first projection matrix; 

25 
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recovering camera rotation vectors from the shift data matrix, and the first 
projection matrix; 

computing a second projection matrix; and 

recovering the direction of camera translation using the shift data matrix, the 
reference image, the second projection matrix and the recovered camera rotation 
vectors. 



3. The method of claim 2, wherein step (f) includes recovering the 3D structure from 
the shift data matrix, the reference image, the recovered camera rotation vectors and the 

10 recovered direction of translation vectors. 

4. The method of claim 1 , further including preliminary steps of : 
recovering the rotations of the camera between each successive image; and 
warping all images in the sequence toward the reference image, while neglecting 

15 the translations. 

5. The method of claim 1 , wherein step (b) comprises: 

computing H and A^h ^ where H is a (n^ - 3)x matrix defined so that HH^ is 
the identity matrix and H annihilates the three vectors , , where the three 
20 vectors are computed from the reference image as 

xp^ ^ {v/ . rW(p)}, ^ {v/ . r(^^(p)}, - {v/ • rW(p) where 
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r«(x,j),r(^Hx,;^),r(')(x,>^)aredefinedby[r«,r(^),r(^)]= 

and A is a shift data matrix, that gives the difference in intensities between each 
successive image and the reference image and is a {N, - 1) x iV^ matrix with entries AI'„ 
,where M'„ is the change in (smoothed) intensity with respect to the reference image, 
5 and with no smoothing M[=r„- 1' , where N, is the number of images, is the 
number of pixels, and where /' denotes the /-th image, with /=0.1...,iV^ -1, and where 
II = r{pj denotes the image intensity at the «-th pixel position in/' , where 7° is the 
O reference image, where x and >^ are the image coordinates of the pixel position and 

m p= (x,y) and where A^^ = AH^ where C is a constant matrix with and where the 

1 0 notation { V} used to denote a vector with elements given by the F " . 

J j 6. The method of claim 1 , wherein step (c) comprises: 

P computing a rank- 1 factorization of - A^^ « M^S^^ where M^'^ S^'^^ are 

^ vectors corresponding to the motion and structure respectively. 

15 

7. The method of claim 1 , wherein step (c) comprises: 

computing a rank-3 factorization of - A^^ « ^M^^^S*^^^ where M^''^, S^'')^ are 

vectors corresponding to the motion and structure respectively; 

setting Z;' as constant within each window, where Z is the depth from the 
20 camera to a 3D scene along the cameras optical axis; 
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listing each of the pixels so that those in the A:-th smoothing window have 
sequential indices nk, (nk+1), . . . (nk+i - 1); 

computing a first projection matrix by computing a Np x Np projection matrix 
which is block diagonal with zero entries between different smoothing windows, and 
which annihilates the vectors {V/ • p}, {/^ } and {/^ } where {V/} is a vector containing the 
gradient of the intensity at each pixel, and L and 1^ are the gradients of the intensity in the 
reference image in the x and j directions; 

recovering the three camera rotation vectors includes solving the following 
equations 

P„(h''S^"^ - Tw^"^) = 0 for the 3-vector w^'^ ; 

computing a second projection matrix includes computing a N^xN^ projection 
matrix P^"^ , which is block diagonal with zero entries between different smoothing 
windows and annihilates -^w^^^ where w^^^ is the vector recovered 

previously; 

recovering the directions of camera translations by solving for the directions of 
translation t*^^^ via 

^ (- T^i'^ } - '^i'^ K }+ ^i'^ {P ' 0 and; 
recovering via 

(h'S^^^I -M^^i -r(^)(z;^(fi^^p, -[tW],).V/J where [i^^l represents ft^T^^ 
the X and y component of the translation direction, and where t^""^ are constants. 



8. The method of claim 1 , wherein step (d) comprises: 
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setting Z;' as constant within each window, where Z is the depth firom the camera 

to a 3D scene along the cameras optical axis; 

Hsting each of the pixels so that those in the k-th smoothing window have 
sequential indices nk, (nu+l), . • . (nk+i - !)• 

9. The method of claim 2, wherein the step of computing a first projection matrix 
includes computing a Np x Np projection matrix P^^ which is block diagonal with zero 
entries between different smoothing windows, and which annihilates the vectors 

{V/ • p}, {l, ] and {ly } where {V/} is a vector containing the gradient of the intensity at 
each pixel, and I;, and are the gradients of the intensity in the reference image in the x 
and y directions. 

10. The method of claim 2, wherein the step of recovering camera rotation vectors 
includes solving the following equation 

Pfj(H^S^'^ - Tw)= 0 for the 3-vector w. 

11. The method of claim 9 wherein, the step of computing a second projection matrix 
includes computing aN^xN^ projection matrix , which is block diagonal with zero 
entries between different smoothing windows and annihilates (h^S^^^)-'¥w where w is 
the vector recovered previously. 
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12. The method of claim 2 wherein, the step of recovering the direction of camera 
translation includes solving for the direction of translation t via 

p,(-fJ/J-2;{/J+i;{p.v/})=o. ^ 

13. The method of claim 3 wherein, step (f) includes, recovering Z„ via 

-[¥wl=Z-j{t^„ -[t]2)-V/„ where [t], represents t% thexandy 
component of the translation direction. 
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A METHOD FOR RECOVERING 3D SCENE STRUCTURE AND CAMERA 
MOTION DIRECTLY FROM IMAGE INTENSITIES 

5. 

ABSTRACT OF THE INVENTION 

The present invention is directed to a method for recovering 3D scene 
10 structure and camera motion from image data obtained from a multi-image sequence, 

wherein a reference image of the sequence is taken by a camera at a reference perspective 
and one or more successive images of the sequence are taken at one or more successive 
different perspectives by translating and/or rotating the camera, the method comprising 
The steps of determining image data shifts for each successive image with respect to the 
1 5 reference image; the shifts being derived from the camera translation and/ or rotation from 
the reference perspective to the successive different perspectives; 
constructing a shift data matrix that incorporates the image data shifts for each image; 
calculating a rank-1 factorizations from the shift data matrix using SVD, with one of the 
rank-1 factors being a vector corresponding to the 3D structure and the other rank-1 
20 factor being a vector corresponding to the size of the camera motions; dividing the 

successive images into smoothing windows; recovering the direction of camera motion 
from the first vector corresponding to the 3D structure by solving a linear equation; and 
recovering the 3D structure by solving a linear equation using the recovered camera 
motion. In accordance with the present invention, the method includes computing afirst 
25 projection matrix; recovering camera rotation vectors from the shift data matrix, and the 
first projection matrix; computing a second projection matrix; and recovering the 
direction of camera translation using the shift data matiix, the reference image, the 
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second projection matrix and the recovered camera rotation vectors. In addition the 
method includes recovering the 3D structure from the shift data matrix, the reference 
image, the recovered camera rotation vectors and the recovered direction of translation 
vectors. 
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